Preserving Fine Phonetic Detail Using Episodic Memory: Automatic Speech Recognition with Minerva2
نویسندگان
چکیده
Previous research has demonstrated competitive recognition results using a simulation of episodic memory 'MINERVA2' on the Peterson & Barney corpus of vowel formant data. This paper presents a modified implementation designed to work on real speech data, and results are reported on isolated-word recognition experiments conducted using the TI-ALPHA corpus. It is shown that access to fine phonetic detail is critical for achieving high recognition accuracy, whether it is provided by the episodic model or by hidden Markov models incorporating large numbers of Gaussian mixture components. However it is confirmed that, although MINERVA2 offers a powerful means for generalizing by accessing the fine detail retained in all the training data, it is severely hampered by its inability to model temporal sequence. It is concluded that a new episodic model is needed that is based on the principles of MINERVA2 but which overcomes such limitations.
منابع مشابه
When Is Fine Phonetic Detail a Detail?
It is our task to take a discussant role in the special session “Sound to Sense: Modelling Fine Phonetic Detail” at ICPhS 2007. The contributions by Moore and Maier [12] and Lecumberri and Cooke [11] inspire further thinking on how fine phonetic details can be successfully explored by humans or by machines. The MINERVA2 system built on the multi-trace (episodic) memory model challenges current ...
متن کاملTemporal episodic memory model: an evolution of minerva2
This paper introduces a new model for automatic speech recognition (ASR) called TEMM Temporal Episodic Memory Model. TEMM is derived from a simulation of human episodic memory called MINERVA2, and it not only overcomes the inability of MINERVA2 to use temporal sequence for recognition flexibly, but it also employs a prediction mechanism as an additional source of information. The performance of...
متن کاملModeling recognition of speech sounds with minerva2
This study investigates the extent to which a localist-distributive hybrid formal model of human memory replicates observed behavioral patterns in perception and recognition of appropriately coded language data. Extending previous research that considered for modeled memorization only items with uniform, undefined randomly generated featural specifications, a MINERVA2 simulation was trained to ...
متن کاملSound to Sense: Introduction to the Special Session
Sound to Sense (S2S) is a Marie Curie Research Training Network, funded 2007-2011. It involves some 50 researchers in 13 institutions in 10 countries. The ultimate aim of S2S is to provide models of speech processing that closely reflect the exquisite flexibility and robustness of human speech processing (HSP), that pave the way for the next generation of robust automatic speech recognition (AS...
متن کاملModelling fine-phonetic detail in a computational model of word recognition
There is now considerable evidence that fine-grained acoustic-phonetic detail in the speech signal helps listeners to segment a speech signal into syllables and words. In this paper, we compare two computational models of word recognition on their ability to capture and use this finephonetic detail during speech recognition. One model, SpeM, is phoneme-based, whereas the other, newly developed ...
متن کامل